Introduction



Machine Learning Model Evaluation Dashboard


Background


Using devices such as Jawbone Up, Nike Fuel Band, and Fit bit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from acceleromators on the belt, forearm, arm, and dumbbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways.

Objective


The goal of this project is to predict the manner in which each subject did their respective exercise. This is the classe variable in the training set. This report aims to describe how the model was built, explain the method of cross validation used, provide an estimation of out of sample error, and to provide a justification of model choices. The model built on the training set will then be used to predict 20 different test cases.

Data


The training data is available here.

The testing data is available here.

The data for this project come from this source.

If you use the this data for any purpose please cite them!

Methodology


This analysis will attempt to classify the data by using three types of robust classification algorithm:

Theses models have been selected due to their high accuracy and manageable complexity. There are algorithms that could offer higher accuracy in the caret package such as mxnet, however due to hardware and time limitations, these may be slightly out of the scope of this project.

These models will be built on a 70% partition of the training data set and bench marked against a validation set, before performing predictions against 20 unlabeled test examples.

These models will be cross validated using the following train control function: trainControl(method = "cv", number = 5). The only other modified parameter is the maximum number of trees allocated to the random forest model which has been limited to 80 in order to reduce computational complexity.






Evaluation of Random Forest Model

Row

Evaluation of Accuracy

Row

Cross-Validation Accuracy vs Number of Random Predictors

Confusion Matrix

$table
          Reference
Prediction    A    B    C    D    E
         A 1674    0    0    0    0
         B    7 1127    5    0    0
         C    0    5 1019    2    0
         D    0    0    7  954    3
         E    0    0    0    2 1080

$overall
      Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
     0.9947324      0.9933361      0.9925313      0.9964182      0.2856415 
AccuracyPValue  McnemarPValue 
     0.0000000            NaN 

Evaluation of Boosting Model

Row

Accuracy vs Boosting Iterations

Confusion Matrix

$table
          Reference
Prediction    A    B    C    D    E
         A 1656   10    3    5    0
         B   38 1060   39    2    0
         C    0   35  979    9    3
         D    1    4   32  918    9
         E    7   11    9   15 1040

$overall
      Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
  9.605777e-01   9.501083e-01   9.552874e-01   9.654049e-01   2.892099e-01 
AccuracyPValue  McnemarPValue 
  0.000000e+00   7.638490e-09 

$byClass
         Sensitivity Specificity Pos Pred Value Neg Pred Value Precision
Class: A   0.9729730   0.9956969      0.9892473      0.9890762 0.9892473
Class: B   0.9464286   0.9834208      0.9306409      0.9873578 0.9306409
Class: C   0.9218456   0.9902550      0.9541910      0.9829183 0.9541910
Class: D   0.9673340   0.9906807      0.9522822      0.9937005 0.9522822
Class: E   0.9885932   0.9913097      0.9611830      0.9975016 0.9611830
            Recall        F1 Prevalence Detection Rate Detection Prevalence
Class: A 0.9729730 0.9810427  0.2892099      0.2813934            0.2844520
Class: B 0.9464286 0.9384683  0.1903144      0.1801189            0.1935429
Class: C 0.9218456 0.9377395  0.1804588      0.1663551            0.1743415
Class: D 0.9673340 0.9597491  0.1612574      0.1559898            0.1638063
Class: E 0.9885932 0.9746954  0.1787596      0.1767205            0.1838573
         Balanced Accuracy
Class: A         0.9843349
Class: B         0.9649247
Class: C         0.9560503
Class: D         0.9790074
Class: E         0.9899515

Evaluation of Support Vector Machine Model

Row

Diagnostics for Linear Support Vector Machine

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 1 

Linear (vanilla) kernel function. 

Number of Support Vectors : 7225 

Objective Function Value : -1450.051 -1275.306 -1046.239 -628.1268 -1326.171 -881.7941 -1774.954 -1207.525 -1037.249 -1209.01 
Training error : 0.210526 

Diagnostics for Radial Support Vector Machine

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 1 

Gaussian Radial Basis kernel function. 
 Hyperparameter : sigma =  0.0138342908497241 

Number of Support Vectors : 7042 

Objective Function Value : -1128.491 -836.3012 -734.9416 -438.009 -1048.849 -589.373 -761.5491 -1012.77 -718.7094 -628.9886 
Training error : 0.068501 

Row

Confusion Matrix Linear Support Vector Machine

$table
          Reference
Prediction    A    B    C    D    E
         A 1550   27   34   55    8
         B  143  824   68   22   82
         C   92   94  804   19   17
         D   68   30  112  707   47
         E   73  131   70   52  756

$overall
      Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
  7.886151e-01   7.310874e-01   7.779569e-01   7.989865e-01   3.272727e-01 
AccuracyPValue  McnemarPValue 
  0.000000e+00   3.543329e-53 

Confusion Matrix Radial Support Vector Machine

$table
          Reference
Prediction    A    B    C    D    E
         A 1660    5    7    2    0
         B   87  997   52    1    2
         C    4   41  956   24    1
         D    6    3  105  849    1
         E    6   13   57   24  982

$overall
      Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
  9.250637e-01   9.050519e-01   9.180390e-01   9.316635e-01   2.995752e-01 
AccuracyPValue  McnemarPValue 
  0.000000e+00   2.353217e-41